-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send RetryInfo on OTel Timeouts #4294
Send RetryInfo on OTel Timeouts #4294
Conversation
@dlvenable this PR shows how to add a |
@KarstenSchnitter , What do you think about this configuration?
Here is a code example of a nested configuration: Lines 45 to 47 in 6a30c6f
The actual implementation is SqsOptions which is another simple POJO class. |
@KarstenSchnitter , What do you have remaining to make this PR ready for review? We did discuss having it be configurable, but anything else to add? |
I am mostly lacking time to make the required changes 😉:
|
DataPrepper is sending `RESOURCE_EXHAUSTED` gRPC responses whenever a buffer is full or a circuit breaker is active. These statuses do not contain a retry info. In the OpenTelemetry protocol, this implies a non-retryable error, that will lead to message drops, e.g. in the OTel collector. To apply proper back pressure in these scenarios a retry info is added to the status. Signed-off-by: Karsten Schnitter <[email protected]>
Implementation of exponential backoff. Idea is to start with a minimum delay on the first time-out or circuit breaker activation. If the next such event happens within twice the last delay after the previous event, double the delay until a maximum delay is reached. Use the maximum delay from then on, until a sufficiently long period (maximum delay) without an event happens. Then the delay is reset to minimum. TODO: Make minimum and maximum delay configurable. Signed-off-by: Karsten Schnitter <[email protected]>
55b91da
to
85a7a18
Compare
(cherry picked from commit 0d45f77) Signed-off-by: Tomas Longo <[email protected]>
(cherry picked from commit f8ac48e) Signed-off-by: Tomas Longo <[email protected]>
(cherry picked from commit ff675dc) Signed-off-by: Tomas Longo <[email protected]>
(cherry picked from commit 43ba7ee) Signed-off-by: Tomas Longo <[email protected]>
(cherry picked from commit 1f90615) Signed-off-by: Tomas Longo <[email protected]>
(cherry picked from commit 2977c1f) Signed-off-by: Tomas Longo <[email protected]>
(cherry picked from commit 473db0e) Signed-off-by: Tomas Longo <[email protected]>
(cherry picked from commit 6ef9b7e) Signed-off-by: Tomas Longo <[email protected]>
(cherry picked from commit 091e9a6) Signed-off-by: Tomas Longo <[email protected]>
(cherry picked from commit a588b09) Signed-off-by: Tomas Longo <[email protected]>
Signed-off-by: Tomas Longo <[email protected]>
Add RetryInfo Configuration
I got help by Tomas Longo, who provided the missing configuration and tests. We also tested, that the RetryInfo is correctly picked up by the OpenTelemetry Collector. With this change Data Prepper exercises back-pressure if the circuit breakers are active. |
Signed-off-by: Tomas Longo <[email protected]>
Add java time module to tests
Signed-off-by: Tomas Longo <[email protected]>
fe555e6
to
b29246a
Compare
There was a slight issue with the initialisation of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @KarstenSchnitter , this should be a good improvement for OTel clients.
...t/java/org/opensearch/dataprepper/plugins/source/oteltrace/OTelTraceSourceRetryInfoTest.java
Outdated
Show resolved
Hide resolved
...ource/src/main/java/org/opensearch/dataprepper/plugins/source/oteltrace/RetryInfoConfig.java
Outdated
Show resolved
Hide resolved
...ource/src/main/java/org/opensearch/dataprepper/plugins/source/oteltrace/RetryInfoConfig.java
Outdated
Show resolved
Hide resolved
...rce/src/main/java/org/opensearch/dataprepper/plugins/source/otelmetrics/RetryInfoConfig.java
Outdated
Show resolved
Hide resolved
...rce/src/main/java/org/opensearch/dataprepper/plugins/source/otelmetrics/RetryInfoConfig.java
Outdated
Show resolved
Hide resolved
...source/src/main/java/org/opensearch/dataprepper/plugins/source/otellogs/RetryInfoConfig.java
Outdated
Show resolved
Hide resolved
...source/src/main/java/org/opensearch/dataprepper/plugins/source/otellogs/RetryInfoConfig.java
Outdated
Show resolved
Hide resolved
Co-authored-by: David Venable <[email protected]> Signed-off-by: Karsten Schnitter <[email protected]>
Co-authored-by: David Venable <[email protected]> Signed-off-by: Karsten Schnitter <[email protected]>
Signed-off-by: Karsten Schnitter <[email protected]>
@dlvenable I renamed the tests. Can you have a look again. I think, that the build failures are caused by different components, not this changeset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @KarstenSchnitter !
2595076
into
opensearch-project:main
Description
DataPrepper is sending
RESOURCE_EXHAUSTED
gRPC responses whenever a buffer is full or a circuit breaker is active. These statuses do not contain a retry info. In the OpenTelemetry protocol, this implies a non-retryable error, that will lead to message drops, e.g. in the OTel collector. To apply proper back pressure in these scenarios a retry info is added to the status.Issues Resolved
Resolves #4119
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.